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SYSTEM AND METHOD FOR ENHANCING DOCUMENT 
TRANSLATABILITY 

BACKGROUND OF THE INVENTION 

I. Related Application 

5 This application is related to concurrently filed applications titled, "System and 

Method for Network-based Teletranslation", Attorney Docket No. 00184.0002, 
commonly assigned, and "System and Method for Internet-based Translation 
Brokerage Services", Attorney Docket No. 00184.0004, commonly assigned, and 
incorporates the commonly assigned applications by reference in their entirety for all 
10 purposes. 

II. Field of the Invention 

The present invention relates generally to language translation and, more 
specifically, to a system and method for enhancing document translatability, 

m. Description of the Related Art 

15 Today, as more and more businesses operate across international borders, they 

are often required to conduct business in more than one language. Also, businesses 
often encounter a need for translating documents from one natural language to another 
natural language. 

In the past, businesses have utilized human-based translation (HT) to translate 
20 documents. Although HT generally produces high quality work, it is inherendy slow, 
labor intensive, and often expensive. Human translators are quite often specialists in 
a given language pair (e.g., English/French). Hence, there is a limitation on how the 
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human translators can be allocated to different translation tasks, thus resulting in 
certain rigidity for a business employing the human translators. 

Because HT is labor intensive, it is difficult to scale up when need increases 
and difficult to scale down when need decreases. The capacity of any group of 
5 translators is fairly well defined. When a sudden need arises to increase the capacity 
for a particular language pair, adding translators to the process creates various 
problems, such as harmonizing different styles, sharing glossaries and context 
information, and merging translated text. 

The document to be translated is often submitted to the translators in different 
1 0 formats, for example, computer printouts, faxes, word processing files, email 
attachments, web pages. The translators are then left to handle the formats and 
extract translatable contents. While requesters of translation services prefer that the 
translated document be in the same format as it was originally submitted, this is often 
not possible, because different translators have varying technical skills and often are 
1 5 unable to reformat the translated document into the original format. 

For these reasons, machine translation software programs, also known as 
machine translation engines, have been developed to provide computerized 
translations. Today, the term Machine Translation (MT) is widely used in the 
industry to refer to computerized systems that translate documents from one natural 
20 language to another, with or without human assistance. It is important to note that the 
term MT does not include computer-based tools that support translators by providing 
access to dictionaries and terminology databases, or tools that facilitate the 
transmission and reception of machine-readable texts, or tools that interact with word 
processing, text editing or printing equipment. The term MT does, however, include 
25 systems in which translators or other users assist computers in the production of 
translations, including combinations of text preparation, on-line interactions and 
subsequent revisions of rnachine-translated documents. 

While MT engines are useful, they have several disadvantages. MT engines 
are typically programmed to handle documents having only certain types of formats. 
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For example, some MT engines accept rich text format (RTF), while others accept 
only ASCII files. As a result, businesses often are forced to turn down translation 
jobs because their MT engines cannot handle a particular format or at best implement 
a non-trivial way of extracting the text for translation from the format information and 
reinserting the translated text back into the format information. 

Documents sent to MT engines typically are composed of various types of 
information, e.g.. text, graphics, diagrams, formatting information, hyperlinks, etc. 
All MT engines are not equal in handling text, graphics, hyperlinks, etc. Some MT 
engines, for instance, are not able to identify hyperlinks, while others miss formatting 
10 tags. 

Furthermore, the text itself may contain information of a more circumstantial 
nature, for example, circumstances relating to a specific time or a place. The phrase 
-Lbs Bouches du Rhone sont ravag&s par le feu" should not be translated into "The 
mouths of the gutter are harrowed by fire" ("Les Bouches du Rhone" is the name of a 
small region in the south of France). Likewise, the phrase "Kohl hat alles verloren" 
should not be translated into "Cabbage has lost it all" (Kohl is a former German 
chancellor). In general. MT engines do not deal with these special problems 
efficiently. If a MT engine is to be programmed to handle these special problems, it 
will necessitate adding many new lines of code to the MT engine. It will require 
10 having access to the source code and/or the necessary programming interfaces of the 
MT engine. It will also require that the code be constantly updated to take into 
account the emergence of new cases. Adding additional code to the MT engine risks 
making the translation process slower. Finally, code changes and additions will be 
unique to each specific MT engine, requiring that the same kind of code changes and 
25 additions be made over and over, once for each specific MT engine. 

For these reasons, it has been recognized that there is a need for enhancing the 
document treatability before submitting it to MT engines. There is a need for a 
system and method that allows MT engines to handle a wide variety of formats. 
Furthermore, there is a need for a system and method that allows MT engines to 
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nt „ __„, circumstantial nature as described before, 
efficiently translate information of a more circumsianuoi 

and where the words used to express the circumstantial nature can vary widely and 
quickly Furthermore, there is a need to solve these special problems with no change 
to the MT engines code, and in a way that is applicable to many MT engines at once. 

5 SUMMARY OF THE INVENTION 

The present invention is directed to a teletranslation system and method for 
enhancing document translatability. The teletranslation system translates a document 
from one natural language to another. In one embodiment, the system comprises an 
aggregate filter having a plurality of sections, each section performing a specific 
10 process or processes on the document in a predetermined order, each section having at 
le ast one atomic filter, and at least one MT engine for translating the processed 
document. In one embodiment, the aggregate filter comprises a format conversion 
section, a text improvement section, a word tagging section, and a translation section. 
The aggregate filter analyzes the document based on a source text, format ^formation. 

15 and a target language. 

The method for enhancing document translatability comprises processing the 
document by an aggregate filter having a plurality of sections, each of the sections 
processing the document in a predetermined order, each section having at least one 
atomic filter, and translating the processed document by a MT engine. The method 
20 further comprises changing the format of the document at a format conversion section, 
modifying the text at a text improvement section, tagging words at a word tagging 
section, and translating the document at a translation section. The method further 
comprises preprocessing the document at the atomic filters in a first pass, and post- 
processing it at the atomic filters in a second pass. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

In the drawings, like reference numbers generally indicate identical, 
functionally similar, and/or structurally similar elements. The drawing in which an 
element first appears is indicated by the leftmost digit(s) in the reference number. 
5 FIG. 1 illustrates an aggregate filter in accordance with one embodiment of 

the present invention. 

FIG. 2 illustrates a format conversion section. 
FIG. 3 illustrates a text improvement section. 
FIG. 4 illustrates a word tagging section. 
10 FIG. 5 illustrates a translation section. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention is a system and method for enhancing document 
translatability. In one embodiment, the system receives a translation document, (i.e.. 
document that needs to be translated). Depending on the content of the document, an 
15 aggregate filter calls (or assembles) an array of other filters, e.g., atomic, load- 
balancing or other aggregate filters, in a predetermined order. 

Some of the atomic filters assembled by the aggregate filter are one-pass filters 
while others are two-pass filters. A one-pass filter performs a preprocessing step in a 
single pass. A two-pass filter performs both a preprocessing and a post-processing 
20 step. In the first pass, the atomic filters in the array preprocess the document or a part 
thereof in the predetermined order. The order in which each atomic filter carries out 
the preprocessing steps depends on the most efficient and logical way to enhance the 
translatability of the document. Once all preprocessing steps in the first pass are 
completed, the document is then translated by a MT engine. The MT engine is also a 
25 type of an atomic filter. The translated document is then further processed, if 

necessary, in the second pass. In the second pass, only the atomic filters having the 
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two-pass configuration carry out additional steps, referred to as post-processing. The 
system finally outputs the translated document. 

The preprocessing and post-processing steps enhance the quality of the 
translation. Without the preprocessing and post-processing steps, the MT engine 

5 would be left to deal with the special problems, such as formats, tags, code set 

detection, code set conversion and circumstantial nature of the document. Since MT 
engines are often ill equipped to handle these special problems, the translation would 
be of a poor quality if translated at all. The preprocessing and post-processing steps 
compensate for the limitations of the MT engines, thereby enhancing the quality of the 

10 translation. 

As noted above, the types of filters assembled by the aggregate filter depend 
on the type of the translation request and on the content of the translation document. 
The aggregate filter analyzes a translation request and determines the types of filters 
needed to perform the particular processing steps. The aggregate filter may assemble 

15 atomic, load balancing, and even other aggregate filters. It is important to note that 
the atomic filter is the basic building block of all other filters. Thus, aggregate filters 
and load-balancing filters are all built with atomic filters. 

In one embodiment, each atomic filter is programmed to perform a specific 
type of processing. The following examples are a few illustrations of processing by 

20 the atomic filters. 

(1) The text in the translation document may, for example, be converted from 
one code set to another. For example, the code set may be converted from Shift-JIS tc 
UTF-8. 

(2) Dates may be converted from one format to another. For example, 

25 04/18/1999 may be changed to 1999-04-18. 

(3) Monetary symbols may be replaced and tagged. For Example, $ may be 
changed to "Dollars Americains" and moved after the dollar amount. Monetary 
amounts could also be replaced with their equivalent in a foreign currency, at a rate 
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that could be specified. For example, "$100" could be change to " 150 dollars (100 
dollars Americains)". 

(4) Proper names may be identified and syntactically defined. The words 
"Boucnes du Rhone", for example, can be identified as a single compound word and 

5 defined as a proper noun. 

(5) Names of works of art, for example, paintings, movies, books, etc., may be 
tagged or directly translated outside the MT engines (using actual names that they 
were given in the target language). 

(6) Names and words that are commonly used in specific regions such as 

10 names of places, people or groups, can also be tagged or directly translated. In some 
cases, the preprocessing also involves some post-processing (for example, change 
dates back to their original format). 

MT engines are typically not programmed to deal with these special problems. 
Preprocessing and post-processing relieves the MT engines from the responsibility of 
15 handling the special problems. As a result, preprocessing and post-processing 

enhances the quality of the translation without changing the codes of the MT engines. 
Also, .the MT engines translate the document faster with increased efficiency. 

Furthermore, preprocessing and post-processing outside the MT engines 
allows the present invention to be used with more than one type of MT engine. In 
20 other words, businesses can advantageously utilize the present invention with their 
existing MT engines, without having to purchase a special type of MT engine, or 
modify the code of any MT engine they own. 

Preprocessing and post-processing results in a more uniform quality of 
translation from various MT engines. This is due to the fact that all MT engines are 
25 not equally efficient in dealing with special problems, such as different types of 
' formats and the circumstantial nature of some information. Solving these special 
problems outside the MT engines allow all MT engines to perform with higher level 
of efficiency and the overall result are higher quality translations which are more 
consistent 

-7 - 



BNSDOCID; <WO 0063796A2_I_> 



WO 00/63796 



PCT/IBOO/00490 



10 



If a MT engine has to be programmed to deal with these special problems, it 
will require adding many new lines of codes to the MT engine. It will also require 
that the added code be frequently updated. This requires highly skilled personnel and 
is thus expensive and slow. The added code is also unique to each specific MT 
engine, requiring that the necessary code changes be made individually on each 
engine, multiplying the costs and delays involved. 

In one embodiment, the translation request comprises a source text, format 
information, and a target language. In addition, the translation request optionally may 
include a list' of words that should not be translated and a list of pre-translated words. 

As an example, the translation request may require the following processing 
steps: format conversion, text improvement, and tags removal. Accordingly, the 
aggregate Filter will assemble an array of atomic filters designed to carry out the 
required steps. FIG. 1 illustrates an aggregate filter 100. The aggregate filter 100 
comprises four processing sections: a format conversion section 104, a text 
15 improvement section 108. a word tagging section 1 12. and a translation section 116. 

1. Format Conversion Section 

The format conversion section converts formats to enhance the treatability 
of the document. FIG. 2 illustrates a format conversion section 200. In one 
embodiment, the format conversion section 200 comprises a language and code set 
detection filter 204, a code set conversion filter 208. a HTML clean up filter 212. and 
a file format conversion filter 216. These filters are further described below. 



20 



(i) Language and Code Set Detection Filter 
The Language and Code Set Detection Filter detects the language and code set 
of the document. This filter is called upon when the language and/or code set of a 
25 document is unknown to the system. In one embodiment, the filter supports the 
languages and code sets listed in Table 1 below. It should be understood that the 
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Language and Code Set Detection Filter can be configured to support other languages 
and code sets that are not listed in Table 1. 



Table 1 - Example of Languages and Suitable Code 





Abbr 


Suitable Code Sets 




en 


rP 1oq9; rp850: Macintosh; utf8 


Arabic 


ar 


cp1256; iso 8859-6; utf8 


casque 




cn1?5?* rpfl50: Macintosh; utf8 


juiqanan 


DQ 


iso 8859-5: Utf8 


onmese 




nK9^i9- hzr bia5: utf8 


Croatian 


sh 

ai ■ 


^io«;n- i«n 8859-2: Macintosh-Croat; utf8 


Uzecn 




^rtio*;n- i<?n 8859-2: utf8 


uanisn 


uo 


mi 9*9- rn850: Macintosh; utf8 


Dutch 


ni 


r-nAOKO* rn850: Macintosh; utf8 


tngiisn 


en 


ml O^O' lltfft 


Estonian 


at 


ico flfl^q-4- utf8 


rinnisn 


Tl 


rnlPSP* on850: Macintosh: utf8 


French 


fr 


rniow- rnfiSO: Macintosh: utf8 


taerman 


Ha 


rni?R9- nn850: Macintosh; utf8 


Greek 


at 
el 


cd1253' cd869; iso 8859-7; Macintosh- 
Greek; utf8 


Hunqarian 


hu 


CD1250; cp852: utf8 


Italian 


it 


CD1252; cp850: Macintosh; utT8 


Japanese 


|a 


euc-io: iso 2022-ip; shift-iis: utf8 


Korean 


ko 


ks c 5601 ; iso 2022-kn utf8 ,, 


Malay 


ms 


CD1252; cp850; Macintosh; utf8 


Norwegian 


no 


CD1252; cp850: Macintosh; utf8 


Polish 


P> 


cd 1252; iso 8859-2; utf8 


Portuguese 


Pt 


CD1252; cp850; Macintosh; utf8 


Russian 


ru 


cp1251; iso 8859-5; koi8-n utf8 


Spanish 


es 


cp1252; cp850: Macintosh; utf8 


Swedish 


Sv 


npiofip; rp850: Macintosh; utf8 


Thai 


Th 


tis 620; utf8 


Turkish 


Tr 


cp853; iso 8859-9; utf8 



5 (ii) Code Set Conversion Filter 

The Code Set Conversion Filter converts a code set to another code set. 
Examples of the various code sets which can be used to encode a given language can 
be found in Table 1 above. It should be understood that the filter may be configured 
to support other code sets. 
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(iii) HTML Clean-Up Filter 
The HTML Clean-Up Filter removes non-standard HTML constructs before 
they are processed by the system. This filter also ensures that HTML tags are not 
embedded inside words. If HTML tags are embedded inside words, they may be 
misinterpreted by MT engines and wrongly translated as part of the data. Since 
HTML evolves constantly, with new constructs appearing periodically, and various 
dialects of HTML also being used, the HTML Clean-Up Filter can also deal with 
novel or indigenous constructs which are not supported yet by the system. 

(iv) File Format Conversion Filter 
The File Format Conversion Filter converts a translation request document to 
and from an internal data format to preserve the original formatting of the document 
after translation. In one embodiment, this filter supports the following file fonnats: 
plain text; HTML; OTEXT; Microsoft Word RTF; Microsoft Word DOC; and Adobe 
Acrobat PDF. It will be apparent to one skilled in the art that the Hie Format 
15 Conversion Filter can be configured to support other file formats. 

2. Text Improvement Section 

The text improvement section modifies the text to enhance the quality of the 
translation. FIG. 3 illustrates a text improvement section 300. In one embodiment, 
the text improvement section comprises a language marker filter 308. a reaccentuation 
20 filter 312, a spell checker filter 316, a grammar checker filter 320, and a controlled 
language filter 324. These filters are further described below. 

(ii) Language Marker Filter 
The Language Marker Filter detects whether any part of a document is of a 
different language than the rest of the document. If so, this filter tags that part of the 
25 document for non-translation. In one embodiment, the tagged text can be translated 
using an appropriate MT engine supporting the necessary language pair. 
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(iii) Reaccentuation Filter 
The Reaccentuation Filter provides accented characters, where appropriate, in 
text written in a language with accents where in fact the user has omitted the accents, 
for example while using an English keyboard to type French text, or where the accents 
5 have been stripped away, for example because of transmission as email on older 7-bit 
only email systems. This filter improves the quality of translation by ensuring that 
words are properly accented. 

(iv) Spell Checker Filter 

The Spell Checker Filter corrects misspelled words automatically where there 
10 is no ambiguity and interactively where there is. thus improving the quality of 
translation. 

(v) Grammar Checker Filter 

The Grammar Checker Filter corrects the grammar used in the text 
automatically where there is no ambiguity and interactively where there is. Improper 
15 grammar confuses MT engines and produces poor quality translation. This filter 
ensures that the grammar is correct so that the MT engines can better translate the 
text. 

(vi) Controlled Language Filter 
The Controlled Language Filter corrects the style of the text where appropriate 
20 to reduce ambiguity. This filter ensures that the text adheres to a particular language 
style, thus guaranteeing consistent translations. 

3. Word Tagging Section 

The Word Tagging Section creates and uses various tags to instruct MT 
engines how deal with special problems. FIG. 4 illustrates a word tagging section 
25 400 in accordance with the present invention. In one embodiment, the word tagging 
section 400 comprises a linguistic tag filter 404. a do not translate filter 408, and a 
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pre-translate filter 412. These filters are further described below. It should be noted 
that the filters of the Word Tagging Section have the ability to use generic tags that 
will be accepted by any MT engine, or MT engine-specific tags that will be used only 
with a given MT engine in order to achieve some specific result with that MT engine, 
5 or to deal with some specific constraint or limitation of that MT engine. 

(i) Linguistic Tag Filter 
The Linguistic Tag Filter tags words that should not be translated by MT 
engines or should at least be treated differently. For example, this filter tags names of 
persons, geographical names, dates, and addresses. 

10 (ii) Do Not Translate Filter 

The Do Not Translate Filter tags a particular set of words, for example, proper 
nouns, that should not be translated. In one embodiment, this filter tags a particular 
set of words supplied prior to or with the translation request as not to be translated. 

(Hi) Pre-Translate Filter 

1 5 The Pre-Translate Filter translates words in a way that is predetermined and 

specific to the document being processed. This filter tags words appropriately so that 
the MT engine will not attempt to translate these words. In one embodiment, this 
filter translates a particular set of words supplied prior to or with the translation 
request and where there is no ambiguity in the translation. 

20 4. Translation Section 

The Translation Section takes care of performing the actual translation itself. 
FIG. 5 illustrates a translation section 500 in accordance with the present invention. 
In one embodiment, the translation stage 500 comprises a translation memory filter 
504 and a MT engine filter 508. 
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(i) Translation Memory Filter 

The Translation Memory Filter only translates parts of text for which it has an 
appropriate translation already in its database, and marks it as non-translatable. The 
translation memory alone is usually not sufficient to translate a complete document. 
5 However, in concert with the MT engine, this filter helps produce a higher quality 
translation. 

(ii) MT Engine Filter 

The MT Engine Filter corrects any idiosyncrasies that the MT engine may 
have (eg., remove hyphens, insert too many spaces, etc.) and submits the text to the 
10 MT engine itself. This filter manages the list of words that the MT engine identifies 
as problematic or impossible to translate. This list is preserved with reference to the 
original document along with the data and time the request was processed. 

In one embodiment, the MT Engine Filter has. for testing purposes, a 
configuration parameter to disable any call to the corresponding MT engine and pass 
15 the databack to a client application. In another embodiment, this filter has a 

configuration parameter to limit the number of concurrent requests it will accept for 
each language pair. When this number of concurrent requests reaches the limit, the 
MT engine filter sends a "server too busy" response to the caller. In one embodiment, 
the MT Engine Filter supports a single MT engine from a single vendor, or possibly 
20 multiple instantiation of a given MT engine from a single vendor. In another 

embodiment, it supports various MT engines from different vendors, or possibly 
multiple instantiation of various MT engines from different vendors. 

It should be understood that the aggregate filter 100 can have other 
combinations of filters depending on the type of translation request and the content of 
25 the translation document. If. for example, the document does not need text 
improvement and does not contain any pre-translated words, then the text 
improvement section and the pre-translate filter are not necessary. In that case, the 
aggregate filter 100 will contain only the format conversion section, the word tagging 
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section without the pre-translate filter, and the translation section. Other processing 
sections and filters, not described in this document, may also be created depending on 
the type of translation request and content of the document. 

The MT engine performs the bulk of the translation. MT engines are 
5 commercially available and are manufactured by various vendors. It is important to 
note that the MT engines are also considered atomic filters. 

According to the present invention, the atomic filters process (i.e. preprocess 
and/or post-process) the data in a predetermined order. The atomic filters assist the 
MT engines in order to enhance the quality of translation, and do so externally without 
,0 requiring internal changes to the MT engines themselves. Each atomic filter performs 
a specific task. Many of these atomic filters and their functionality have been 
described above. The present invention assembles a variety of atomic filters to assist 
MT engines in order to provide a high quality translation. Furthermore, the present 
invention assembles filters selectively based on the type of specific translation request 

15 and the content of the document. 

In one embodiment where the invention is implemented using software, the 
software may be stored in a computer program product and loaded into a computer 
system using a removable storage drive or a hard drive. The software may be stored 
in a CD-ROM. a floppy disk or any other type of storage device. 
20 In another embodiment, the invention can be implemented primarily in 

hardware using, for example, hardware components such as application specific 
integrated circuits (ASICs). Implementation of such a hardware state machine so as to 
perform the functions described herein will be apparent to persons skilled in the 
relevant art(s). In yet another embodiment, the invention is implemented using a 
25 combination of both hardware and software. 

While various embodiments of the present invention have been described 
above, it should be understood that they have been presented by way of example only, 
and not limitation. Thus, the breadth and scope of the present invention should not be 
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limited by any of the above-described exemplary embodiments, but should be defined 
only in accordance with the following claims and their equivalents. 
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1 WHA T IS CLAIMED IS: 

2 1. A teletranslation system for enhancing document treatability, the 

3 teletranslation system translating a document from one natural language to another, 

4 comprising: 

5 an aggregate filter having a plurality of sections, each of the sections 

6 processing the document in a predetermined order, each section having at least one 

7 atomic filter; and 

8 a MT engine for translating the processed document. 

1 2. The system as recited in claim 1, wherein the aggregate filter 

2 comprising: 

3 a format conversion section; 

4 a text improvement section; 

5 a word tagging section; and 

6 a translation section. 

! 3. The system as recited in claim 1, wherein the translation document 

2 further comprising: 

3 source text; 

4 format information; and 

5 target language. 

! 4. The system as recited in claim 1, wherein the translation document 

2 further comprising: 

3 list of words that should not be translated; and 

4 list of pretranslated words. 

1 5. The system as recited in claim 1, wherein the aggregate filter comprises 

2 one or more atomic filters. 
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! 6. The system as recited in claim 1 . wherein the aggregate filter comprises 

2 one or more aggregate filters. 

1 7. The system as recited in claim 1 , wherein the aggregate filter comprises 

2 one or more load-balancing filters. 

1 8. The system as recited in claim 1 . wherein the aggregate filter comprises 

2 a combination of one or more atomic, aggregate and load-balancing filters. 

1 9. The system as recited in claim 1. wherein the atomic filters are one- 

2 pass filters programmed to perform a preprocessing step in a single pass. 

1 10. The system as recited in claim 1, wherein the atomic filters are two- 

2 pass filters programmed to perform a preprocessing step and a post-processing step in 

3 a first and a second pass, respectively. 

1 1 1 . The system as recited in claim 1 , wherein specific data is gathered by 

2 the two-pass filter during the preprocessing step in the first pass and this specific data 

3 is used during the post-processing step in the second pass,. 

1 12. The system as recited in claim 1, wherein the atomic filters process the 

2 document or a part thereof. 

1 13. A method for enhancing document translatability of a teletranslation 

2 system translating a document from one natural language to another, comprising the 

3 steps of: 

4 processing the document by an aggregate filter having a plurality of sections, 

5 each of the sections processing the document in a predetermined order, each section 

6 having at least one atomic filter, and 

7 translating the processed document by a MT engine. 

i 14. The method as recited in claim 13 further comprising the steps of: 
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2 changing the format of the document at a format conversion section; 

3 modifying the text at a text improvement section; 

4 tagging words at a word tagging section; and 

5 translating the document at a translation section. 

1 15. The method as recited in claim 13, further comprising the step of 

2 preprocessing the document at the atomic filter in a first pass. 

1 16l The method as recited in claim 13, further comprising the step of post- 

2 processing the document at some atomic filter in a second pass. 

1 17. The method as recited in claim 13, further comprising the step of 

2 gathering specific data on the document at some atomic filters during the 

3 preprocessing step of their first pass, and using such specific data during the post- 

4 processing step of their second pass. 

1 18. The method as recited in claim 13, further comprising the step of 

2 processing the document or a part thereof at the atomic filter. 

! 19. A program storage device readable by a machine, tangibly embodying a 

2 program of instructions executable by the machine to perform method steps for 

3 enhancing document treatability of a teletranslation system translating a document 

4 from one natural language to another, the method comprising the steps of: 

5 processing the document by an aggregate filter having a plurality of sections, 

6 each of the sections processing the document in a predetermined order, each section 

7 having at least one atomic filter; and 

8 translating the processed document by a MT engine. 

1 20. The program storage device as recited in claim 19, wherein the method 

2 for enhancing document translatability further comprising the steps of: 

3 changing the format of the document at a format conversion section; 

4 modifying the text at a text improvement section; 
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5 tagging words at a word tagging section; and 

6 translating the document at a translation section. 

1 2 1 . The program storage device as recited in claim 1 9, wherein the method 

2 for enhancing document treatability further comprising the step of preprocessing 

3 the document at the atomic filter in a first pass. 

1 22. The program storage device as recited in claim 19, wherein the method 

2 for enhancing document translatability further comprising the step of post-processing 

3 the document at some atomic filter in a second pass. 
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